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Abstract 

We introduce a novel parametric family of symmetric information-theoretic dis- 
tances based on Jensen's inequality for a convex functional generator. In particular, 
this family unifies the celebrated Jeffreys divergence with the Jensen-Shannon diver- 
gence when the Shannon entropy generator is chosen. We then design a generic algo- 
rithm to compute the unique centroid defined as the minimum average divergence. 
This yields a smooth family of centroids linking the Jeffreys to the Jensen-Shannon 
centroid. Finally, we report on our experimental results. 
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1 Introduction to statistical distances 



The Shannon differential entropy Cover and Thomas (1991)] of a continuous 



probability distribution^] p measures the amount of uncertainty: 

Hip) = f p( x ) 1°S dx = — / p(x) \ogp(x)dx. 
J v(x) J 



The cross-entropy Cover and Thomas (1991)] measures the amount of extra 



bits required to compute a code based on an observed empirical probability p 
instead of the true probability p (hidden by nature): 

H(j) : p) = J p{x) log -r^dx = — J p{x) \ogp{x)dx. (2) 



The ":" notation emphasizes on the oriented aspect Cover and Thomas (1991)"] 



of the functional: H(p : q) ^ H(q : p). The Kullback-Leibler divergence [Kullback and Leibler(1951)|Co 
is a statistical distance measure computing the relative entropy as follows: 

/nix) 
p(x) log — — dx (3) 
q(x) 

= H(p:q)-H(p)>0, (4) 

This last inequality is called Gibb's inequality |Cover and Thomas(1991)| , 
with equality if and only if p — q. We have H(p : q) = H(p) + KL(p : q). 
The Kullback-Leibler divergence can be extended to unnormalized positive 
distributions (or positive arrays in discrete cases) as follows: 

// p(x) \ 

I p(x) log + q(x) — p(x) J dx, (5) 

= eH(p : q) - eH(p) > 0, (6) 

with eH(p : q) = J(p(x) log + q(x))dx and eH(p) = eH(p, p). 

(Renyi based on an axiomatic approach |Renyi(1961)| derived yet another 
expression for the Kullback-Leibler divergence of unnormalized generalized 
distributions.) 



1 For sake of simplicity and without loss of generality, we consider the probability 
density function p of a continuous random variable X ~ p. For multivariate densities 
p, the integral notation J denote the corresponding multi-dimensional integral, so 
that we write for short J p(x)dx = 1. Our results hold for probability mass functions, 
and probability measures in general. 
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Many applications in Information Retrieval Deselaers, Keysers, and Ney(2008)|Rubner, Puzicha, Tom; 



(IR) requires to deal with a symmetric distortion measure. Jeffreys diver- 
gence |Jeffreys(1973)| (also called J-divergence) symmetrizes the oriented Kullback- 
Leibler divergence as follows: 

J(p, q) = KL(p : q) + KL(g : p) = J{q, p) (7) 
= H(p : q) + H(q : p) - (H(p) + H(q)), (8) 
p(x) 

(p(x) - q{x)) log — -dx. (9) 
q(x) 

Here, we replaced ":" by "," in the distortion measure to emphasize the sym- 
metric property: J(p,q) = J(q,p). Jeffreys divergence is interpreted as twice 
the average of the cross- entropies minus the average of the entropies. One of 
the drawbacks of Jeffreys divergence is that it may be unbounded and therefore 
numerically quite unstable to compute in practice: For example, let p = (p l )f =1 
and q = (q l )f =1 be frequency histograms with d bins (discrete distributions 
called multinomials), then J(p,q) — >■ oo if there exists one bin I £ {l,...,d} 
such that p l is above some constant, and q l — > 0. In that case, p l \og ^ — > oo. 
To circumvent this unboundedness problem, the Jensen- Shannon divergence 
was introduced in |Lin(1991)j . The Jensen-Shannon divergence symmetrizes 
the Kullback-Leibler divergence by taking the average relative entropy of the 
source distributions to the entropy of the average distribution 



3S(p,g) = l(Kl(p:^)+KL(g:^))=3S(g,p) { 10) 



1( H(v- P i)-H(v) + H( q ^)-H (q) ), (11) 



2 \ V 2 J ^' V 2 

h - m±m > o. (13) 



The Jensen-Shannon divergence has always finite values, and its square root 
yields a metric, satisfying the triangular inequality. Moreover, we have the 
following information-theoretic inequality |Lin(199lj] 

0< JS(p,g)<^J(p,g). (14) 

By introducing the K- divergence |Lin(1991)] (see Eq.0: 

K(p :q) = J p(x) log ^+\ {x f c = KL (p : E±«) , (15) 
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we interpret the Jensen- Shannon divergence as the Jeffreys symmetrization of 
the /^-divergence (see Eq. [7]). 



3S( P ,q) = ^(K(p:q)+K(q:p)), (16) 
= H ( P + q ) - H ^ + H{ - q) (17) 



Consider the skewed K- divergence 



p( x ) lo § Ti \ / \ . r^dx, 

(1 — a)p(x) + aq(x) 



and its symmetrized divergence 

3S a M- K ° (l ' :q)+ n K " {q:P) =3S < ,( q ,p). (19) 



For a = |, we find the Jensen- Shannon divergence: JS(p, q) = JSi(p, q). For 

a = 1, we obtain half of Jeffreys divergence: JSi(p, q) = \J(p, q)- It turns out 
that this family of a- Jensen-Shannon divergence belongs to a broader family of 

information-theoretic measures, called Ali-Silvey-Csiszar divergences |Csiszar(1967)|Ali and Silvey(196( 

A 0-divergence is defined for a strictly convex function </> such that 0(1) = 

as: 

hip :q) = J Q(x)<i> Mfi} dx. (20) 

We can always symmetrize ^-divergences by taking the coupled convex function 
<fi*(x) = x0(-). Indeed, we get 



ir(p:q) = Jq(xw(^)te, (21) 

^'iMh (22) 

p{x)<t> (^j dx = I+{q : p). (23) 

Therefore, 7^_|_^*(p, q) is a symmetric divergence. Let 4> s = + 0* denote the 
symmetrized generator. Jeffreys divergence is a (^-divergence for <p(u) = — log u 
(and s (u) — {u — 1) log it). Similarly, Jensen- Shannon divergence is inter- 
preted as JS(p, q) = \{K{p '■ q) + K(q '■ p)), with \K(p : q) a 0-divergence 
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for (f)(u) = flog^p^, see |Lin(199lj] . It follows that Jensen-Shannon is also 
a 0-divergence. The a- Jensen-Shannon divergences are 0-divergences for the 
generators (p s a = 0* + (fi a , with 0*(x) = — log((l — a) + ax) and (p a (x) = 
— xlog((l — a) + -). a- Jensen-Shannon divergences are convex statistical dis- 
tances in both arguments. 

One drawback for estimating a-JS divergences on continuous parametric den- 
sities (say, Gaussian distributions), is that the mixture of two Gaussians is 
not anymore a Gaussian, and therefore the average distribution falls outside 
the family of considered distributions. This explains the lack of closed-form 
solution for computing the Jensen-Shannon divergence on Gaussians. 

Next, we introduce a novel family of symmetrized divergences which yields 
closed-form formulas for statistical distances of a large class of parametric 
distributions, called statistical exponential families. 



2 A novel parametric family of Jensen divergences 

At the heart of many statistical distances lies the celebrated Jensen's convex 
inequality |Jensen(1906)| . For a strictly convex function F and a parameter 
a G K\{0, 1}, let us define the a-skew Jensen divergence as 

J { f\p ■ q) = —n~ n / ((! - a)F(p(x)) + aF(q(x)) - F((l - a)p{x) + aq(x))dx.(24) 

a[l — a) J 

This statistical divergence is said separable as it can be rewritten as 

J { f\p ■ ( l)= 3 { f\p{x) ■ q(x))dx, (25) 



with the scalar basic distance being defined as 

P{x : y) = 1 ((1 - a)F(x) + aF(y) - F((l - a)x + ay)) . (26) 
a{± — a) 



Furthermore, we refine the definition of a-skew Jensen divergence to real- 
valued d-dimensional vectors p and q as 

J { F a) (p:q)=itj { F a \p l --q i )ax. (27) 
i=i 



In the limit cases, we find the oriented Kullback-Leibler divergences |Nielsen and Boltz(2011)] 
when we choose generator F(x) = — xlogx: 
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limjW(p:g) = KL(p:g), (28) 
\imJ F ° l) (p:q) = KL(q:p). (29) 

a— >1 

Observe also that Jp(q : p) = Jp?" _Q ^(p : g), and that therefore a-skew Jensen 
divergences are asymmetric distortion measures (except for a — ~). Therefore, 
let us symmetrize those a-skew divergences by averaging the two orientations 
as follows: 



1 



r(«) 



(a). 



(4 a) (p:g) + jr aj (p:g)) 



(1-a), 



(30) 

(31) 



2a(l — a 

— F(ap(;r) + (1 — a)q(x)) 



(F(p(x)) + F(q(x)) 

F((l - a)p{x) + aq{x)) dx (32) 



sJ^ } (q,p) 



s jr a) 



(p,q) >0 



(33) 



For discrete rf-dimensional parameter vectors p and g (with respective coordi- 
nates p 1 , ...,p d and q 1 , ...,q d ), we define analogously the a-skew divergences as 
follows: 

sjP(p, q) = 1 £ (F(p*) + F(g*) - F(ap l + (1 - a)q l ) - F((l - «y + aq l ) dx.(34) 



Those statistical divergences are separable as they can be written as sJp (p, q) = 
I s Jf^(p( x ) : q(x))dx, where sj F denotes the corresponding basic scalar dis- 
tance measure: 

sj^x, y) = — 1 , (F(x) + F(y) - F{ax + (1 - a)y) - F((l - a)x + ay) dz.(35) 

la 1 — a 



Figure [T] shows graphically this novel family of symmetric Jensen divergences 
by depicting its associated basic scalar distance sj£^ (it is enough to consider 
a G [0, |]). Note that except for a G {0,1}, this family of divergences have 

necessarily the boundedness property: < sJ^(p, q) < oo, Va {0, 1} 

Consider the strict convex generator F(x) = xlogx (Shannon information 
function or equivalently the negative Shannon entropy functional). Rewriting 
the divergence for this generator, we get a family of symmetric Kullback-Leibler 
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Fig. 1. A family of separable symmetric scalar Jensen divergences 
{sJ^p\p,q) = fs}^p\p(x), q(x))dx} a for a G (0, \] based on Jensen's con- 
vexity gap that includes both Jeffreys divergence in the limit case a = and 
Jensen-Shannon divergence for a = |, for the Shannon information generator. 

Here, we plot the scalar base distance sj^ that induces the statistical distance. 
divergences: 

sKI>>(p, q) = — i , (ff(ap + (1 - + H((l - a)p + aq) - (H(p) + if (g))) > 0(36) 

Za{l — a) 

We have in the limit case: 

limsKL (a) (p,g) = J(p,q) = sKL (0) (p,g). (37) 

a— »0 

That is, symmetrized a- Jensen divergences tend asymptotically to the Jeffreys 
divergence for the Shannon information generator. Furthermore, consider the 
case a — |: 

sKL<*>(p, g) = 2 (2// (*±^) - (ff(p) + if (g))) = 4JS(p, g). (38) 

Thus this family of symmetric Kullback-Leibler divergences unify both Jensen- 
Shannon divergence (up to a constant factor for a = \) with Jeffreys diver- 
gence (a — > 0). 

Theorem 1 There exists a parametric family of symmetric information-theoretic 
divergences {sKL^} a that unifies both Jeffreys J -divergence (a — > 0) with 
Jensen-Shannon divergence (a = \). 
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This result can be obtained by considering skew average of distributions in- 
stead of the one-half of Eq. 15 



t , \ H{{1 - a)p + acq) - H(p) 

L a {p ■ q) = ^ v > (39) 

Then it comes out that (see Eq. [7]) 

sKI>>(p, q) = — 1 -{L a {p : q) + L a (q : p)). (40) 

2a(l — a) 



Note that Li(p : q) = AK(p : q). The scaling factor is due to historical 
convention. However L a is in general not a 0-divergence. 

An alternative description of the symmetric family is given by 

(«)/ \ 2 ( ^, x ^, s „{\ — ol 1 + a \ „/l + a 1 — a 



S ( f'(p, q) = [F{p) + F{q) - F ^—p + —q j - F ^—p + —q ) ) .(41) 

It can be checked that sJ^Qo, q) = S'f \p, q) for a' = 1 — 2a. 

Many parametric distributions follow a regular structure called exponential 
families. In the next section, we shall link that class of symmetric sJ a -divergences 
to equivalent symmetric a-Bhattacharrya divergences computed on the distri- 
bution parameter space. 



3 Case of statistical exponential families 



Many common statistical distributions are handled in the unified framework 
of exponential families Nielsen and Nock (2 009) [Nielsen and Boltz(2011)| . A 



distribution is said to belong to an exponential family Ep, if its parametric 
density can be canonically rewritten as 

p F {x- 6) = exp«t(x), 9) - F{6) + k(x)), (42) 



where 9 describes the member of the exponential family Ep = {pp{x;6) \6 G 
0}, characterized by the log-normalizer F(9), a convex different iable function. 

(x, y) denotes the inner-product (e.g., x T y for vectors, see Nielsen and Nock(2009) [Nielsen and Boltz(2l 
t(x) is the sufficient statistic. 

Discrete (i-dimensional distributions (corresponding to frequency histograms 
with d non-empty bins in visual applications) are multinomials, an exponential 
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family with the dimension of the natural space G being d—1 (the order of the 
family). In information retrieval Rubner, Puzicha, Tomasi and Buhmann(2001)| , 



one often needs to perform clustering on frequency histograms for building a 
codebook to perform efficiently retrieval queries (eg., bag-of- words method |Fei-Fei and Perona(2005)| ). 

It is known that the Kullback-Leibler divergence of members p ~ E F (8 P ) and 
q ~ E F (6 q ) of the same exponential family Ep is equivalent to a Bregman di- 
vergence on the swapped natural parameters jBanerjee et al.(2005)Banerjee, Merugu, Dhillon, and Ghc 

KL(p F (x; 6 p ) : p F (x; B q )) = B F (6 q : 6 P ), (43) 



where B F (6 q : P ) = F(9 q ) - F{9 p ) - (9 q - 6 p , VF{6 P )}. 

The Jeffreys J-divergence on members of the same exponential family (lhs.) 
can be computed as a symmetrized Bregman divergence, yielding an equivalent 
calculation on the natural parameter space (rhs.): 

j( PF (x; e p ), PF (x; e q )) = (e p - e q ) T (VF(e p ) - VF(e q )) (44) 



Note that although the product of two exponential families is an exponential 
family, it is not the case for the mixture of two exponential families. Indeed, 
the mixture (1 — a)p + aq does not in general belong to E F . Therefore, the 
Jensen- Shannon divergence on members of the same exponential family cannot 
be computed directly from the natural parameters, since it requires to compute 
the entropy of the mixture distribution (with no known generic closed form): 

JS(p = p F (x; 6 p ),q = PF (x; q )) = H (*±«) - - {p) ± H{q) , (45) 



In fact, it turns out that Eq. [43] is the limit case of the property that a-skew 
Bhattacharrya divergence of members p = p F (x; 6 p ) and q = p F (x; 6 q ) of 
the same exponential family E F is equivalent to a a- Jensen divergence defined 



on the natural parameters Nielsen and Boltz(2011)| : 



B^(p F (x; 6 P ) : p F (x; 9 q )) = - log J p F (x; 6 p ) a p F (x; O^dx, (46) 

= J { F \e p :9 q ), (47) 

with the a- Jensen divergence defined on the distribution <i-dimensional pa- 
rameter vectors as 

a \ L a ) i=i 
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We can therefore symmetrize a-skew Bhattacharrya divergences: 



sB^(p F (x; 9 p ),p F (x; 6 q )) = \{B^\p F {x; 6 P ) : p F (x; q )) + B^(p F (x; 6 q ) : p F (x; 9$$ 
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: -~ log (J p a (x)q 1 - a (x)dx^ (J p 1 - a (x)q a (x)dx) (50) 
■ a{l-a)sJ { F } {6 p ,0 g ), (51) 



and obtain equivalently a symmetrized skew Jensen divergence on the natural 
parameters. 

Theorem 2 The symmetrized skew a-Bhattacharyya divergence on members 
of the same exponential family is equivalent to a symmetrized skew a- Jensen 
divergence defined for the log-normalizer and computed in the natural param- 
eter space. 

Let us now consider computing centroidal centers (say, for fc-means clustering 
applications Banerjee et al.(2005)Banerjee, Merugu, Dhillon, and Ghosh] ). 



4 Symmetrized skew a-Jensen centroids 



Consider the discrete symmetrized a- Jensen divergences (not any more on 
distributions but on rf-dimensional parameter vectors). In particular, we get 
for separable divergences: 

sJ^fo y) = ^iT — v E (/V) + *V) - F ( axl + C 1 - a )y l ) - F ^ - <*>' + °d) -( 52 ) 

2or(l - a) i=1 ^ / 



This family of discrete measures includes the extended Kullback-Leibler diver- 
gence for unnormalized distributions by setting F(x) = xlogx. The barycen- 
ter b of n points p±, ...,p n is defined as the (unique) point that minimizes the 
weighted average distance: 

n 

b = arg min yjj x sJ F (pi,c), (53) 
c i=i 



for w = (wi, ...,w n ) a normalized weight vector (Vz, Wi > and J2i w i — 1)- m 
particular, choosing Wi = - for all z yields by convention the centroid. Note 
that the multiplicative factor in the energy function of Eq. 53 does not impact 
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the minimum. Thus we need to equivalently minimize: 

n 

mmE(c) = min^ Wi(F(pi) + F(c) - F(api + (1 - a)c) - F(ac + (1 - a)p;)).(54) 



c 

i=l 



Removing the constant terms (i.e., independent of c), this amounts to minimize 
the following energy functional (J2i w i = 1) : 

argmin c .E(c) = argmin c F(c) — '^2w i (F(ap i + (1 — a)c) + F(ac + (1 — a)pj)).(55) 



Since F is convex, E is the minimization of a sum of a convex function 
plus a concave function. Therefore, we can apply the ConCave-Convex Pro- 
cedure Sriperumbudur and Lanckriet(2009)] (CCCP) that guarantees to con- 



verge to a minimum. We thus bypass using a gradient steepest descent nu- 
merical optimization that requires to tune a learning step parameter. 

Initializing 

n 
i=l 

to the Euclidean barycenter, we iteratively update as follows: 

n 

VF(c m ) = ]T Wi ((l - a)VF(a Pi + (1 - a)c t ) + aVF(ac t + (1 - a) Pi )), (57) 
i=i 

That is, 

ct+i = (ViT 1 ( f>»((l - a)VF(a Pi + (1 - «)c t ) + «VF(«q + (1 - a)pi))j (58) 



(Observe that since F is strictly convex, its Hessian V 2 F is positive-definite so 
that the reciprocal gradient is VF" 1 is well-defined, see |Rockafeller(1969)] .) 

In the limit case, we get the following fixed point equation: 

c* = (VFT 1 (^2wi((l - a)VF(api + (1 - a)c*) + aVF(ac* + (1 - a)pi))j .(59) 



This rule is a quasi-arithmetic mean, and can alternatively be initialized using 
c' Q = VF~ 1 (J^ =1 WiVFipi)) instead of c . 
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Let us instantiate this updating rule for a 
Burg information functions: 



\ and Wi 



on Shannon and 



Shannon information F(x) = x logx — x 
V-F(x) = logx, (VF) _1 (x) = expx 



Burg information F{x) — — logx 
VF(x) = -1/x, (VF)-\x) = -1/x 



c t +i 



Ct+Pi 



— > Geometric update 



Ct+l — — 2 
2—n=\ c t + Pi 

— > Harmonic update 



Note that for Jeffreys (a = 0) and Jensen-Shannon (a = |) divergences, the 
energy function is convex, and therefore the minimum is necessarily unique. 
(In fact, both Jeffreys and Jensen-Shannon are two instances of the class of 
convex Ali-Silvey-Csiszar divergences jCsiszar (1967)|Ali and Silvey(1966)l .) 

Since a-JS divergences are ^-divergences (convex in both arguments), the 
barycenter with respect to a-JS is unique, and can be computed alternatively 



using any convex optimization technique. Ben-Talet al. Ben-Tal et al.(1989)Ben-Tal, Charnes, and Tel 



called those center points entropic means; They consider scalar values that can 
be extended to dimension-wise separable divergences, but not to normalized 
nor continuous distributions. 

Theorem 3 The centroid of members of the same exponential family with re- 
spect to the symmetrized a-Bhattacharyya divergence can be computed equiv- 
alently as the centroid of their natural parameters with respect to the sym- 
metrized a- Jensen divergence using the concave- convex procedure. 

Note that for members of the same exponential family, both cq or c' ini- 
tializations are interpreted as left-sided or right-sided Kullback-Leibler cen- 
troids |Nielsen and Nock(2009j] . 



5 Experimental results 



Statistical distances play important roles either in supervised classification 

tasks (e.g., interclass distance measure for feature subset se lection methods |(Molina et al. (2 002))Molin 
or in unsupervised clustering (e.g., centroid-based fc-means Banerjee et al.(2005)Banerjee, Merugu, Dhi 
In unsupervised settings, statistical divergences are both used in the cluster- 
ing preprocessing stage for building a codebook (e.g., using fc-means algo- 
rithm jBanerjee et al.(2005)Banerjee, Merugu, Dhillon, and Ghosh] ) , and when 
answering on-line queries (e.g., classification using the nearest neighbor rule). 
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Fig. 2. Performance of the symmetrized a-skew Jensen divergences for a binary clas- 
sification task (y-axis, percentage of correct classification) with respect to a G [0, |] 
(x-axis). 

Since we proposed a novel parametric family of statistical symmetric diver- 
gences linking continuously the Jeffreys divergence to the Jensen-Shannon 
divergence, let us study the impact of the a parameter in a toy application. 
Namely, we consider binary classification of images: That is, given a set of an- 
notated images either with tag 1 or tag 2, perform classification of images using 
the nearest neighbor rule. We use the Caltech 101 database Fei-Fei and Perona(2005)j 



that consists of 101 categories with about 40 to 800 images per category, and 
select the airplanes (800 images) and Faces (436 images) categories. For 
each color image, we choose the intensity histogram^] as its feature vector. 
We compute a centroid for each of the airplane/Faces categories, and clas- 
sify all images according to the nearest neighbor rule between those two class 
histogram centroids and the query image histogram with respect to the sym- 
metrized a-skew Jensen divergence. Figure [2] displays the performance plot 
(correct percentage of classification) of this binary classification task. We em- 
pirically observe that performance may vary with a as expected, and that the 
best a needs to be tuned according to the training data hinting at the under- 
lying geometry of data-sets. Here, the best correct classification rate (about 
88%) is obtained for a — ~, that is for the mid-divergence between Jeffreys 
and Jensen- Shannon divergences. 



6 Concluding remarks 



In this paper, we have introduced a novel parametric family of symmetric di- 
vergences based on Jensen's inequality called symmetrized a-skew Jensen di- 
vergences. Instantiating this family for the Shannon information generator, we 

2 We convert (R, G, B) colors into corresponding intensities / = 0.3i? + 0.596G + 
0.1 IS, and ensure (by adding a small non-zero constant) that histogram bins are 
never empty in order to have proper multinomial distributions. 
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have exhibited a one-parameter family of symmetrized Kullback-Leibler diver- 
gences. Furthermore, we showed that for distributions belonging to the same 
exponential family, the symmetrized a-Bhattacharyya divergence amounts to 
compute a symmetrized a- Jensen divergence defined on the parameter space, 
thus yielding a closed-form formula. We then reported an iterative algorithm 
for computing the centroid with respect to this class of divergences. 

For applications like information retrieval requiring symmetric statistical dis- 
tances, the choice is therefore not anymore to decide between Jeffreys or 
Jensen- Shannon divergences, but rather to choose or tune the best a param- 
eter according to the application and input data. 
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